Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groundwork for fusion implementation and assorted bug fixes #18

Merged
merged 5 commits into from
Nov 20, 2018

Conversation

richardwu
Copy link
Collaborator

@richardwu richardwu commented Nov 15, 2018

Note: please review the latest commit. The first two commits are outstanding from #15 . rebased

Closes #14, #16 .

Some notable changes:

  • Support for loading fusion and/or repair raw datasets.
  • Imported flights dataset for fusion testing.
  • Restructured test cases to use unittests.
  • Removed unnecessary try excepts that caused exceptions to be silenced.
  • Fixed a few bugs as a result of silenced exceptions.
  • Keep attributes/columns in their original case (instead of lowercasing
    everything): modify Postgres queries to quote references to columns.

@richardwu
Copy link
Collaborator Author

Output from hospital dataset on master/HEAD:

...
Precision = 0.92, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.79, Repairing F1 = 0.86, Detected Errors = 437, Total Errors = 509, Correct Repairs = 351, Total Repairs = 380, Total Repairs (Grdth present) = 380
...

Output from hospital dataset with this patch/PR:

Precision = 0.94, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.80, Repairing F1 = 0.87, Detected Errors = 438, Total Errors = 509, Correct Repairs = 351, Total Repairs = 372, Total Repairs (Grdth present) = 372

The fewer false positives result from fixing "consistency" issues with our normalization of values in #16.

dataset/dataset.py Show resolved Hide resolved
dataset/dataset.py Outdated Show resolved Hide resolved
dataset/table.py Outdated Show resolved Hide resolved
dataset/table.py Show resolved Hide resolved
dataset/table.py Outdated Show resolved Hide resolved
evaluate/eval.py Show resolved Hide resolved
tests/test_holoclean_fusion.py Show resolved Hide resolved
python test_holoclean.py
# Launch tests.
echo "Launching tests..."
python -m unittest discover .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments below

tests/test_holoclean_repair.py Show resolved Hide resolved
holoclean.py Show resolved Hide resolved
Copy link
Contributor

@minafarid minafarid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, please see minor comments.

dataset/dataset.py Outdated Show resolved Hide resolved
dataset/table.py Show resolved Hide resolved
dataset/table.py Show resolved Hide resolved
dcparser/dcparser.py Outdated Show resolved Hide resolved
holoclean.py Show resolved Hide resolved
tests/test_holoclean_fusion.py Show resolved Hide resolved
tests/test_holoclean_repair.py Show resolved Hide resolved
@minafarid
Copy link
Contributor

Looks good, please resolve the conflicts and modify the Usage section in the README.md file to refer to the examples folder

Some notable changes:
- Support for loading fusion and/or repair raw datasets.
- Imported flights dataset for fusion testing.
- Restructured test cases to use unittests.
- Removed unnecessary try excepts that caused exceptions to be silenced.
- Fixed a few bugs as a result of silenced exceptions.
- Keep attributes/columns in their original case (instead of lowercasing
everything): modify Postgres queries to quote references to columns.
@richardwu
Copy link
Collaborator Author

Rebased and fixed merge conflicts, should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants